Organization

Datasets

For training models we support the following types of datasets:

  • chat: Like RyokoAI/ShareGPT52K or WizardLM/WizardLM_evol_instruct_V2_196k

    // Type: `chat`
    [
    	{"conversations": [{"from": "system", "value": "..."}]},
    	{"conversations": [{"from": "human", "value": "..."}]},
    	{"conversations": [{"from": "gpt", "value": "..."}]},
    	{"conversations": [{"from": "human", "value": "..."}]},
    	{"conversations": [{"from": "gpt", "value": "..."}]},
    	...
    ]
    // "from" can only be one of "system", "human", "gpt"
    // There can only be 1 "system" in starting of the conversation
    // After that, "human" and "gpt" alternate every message starting with human
    
  • instruct: Like vicgalle/alpaca-gpt4 or tatsu-lab/alpaca

    // Type: ``instruct``
    [
    	{"instruction": "...", "input": "...", "output": "..."},
    	{"instruction": "...", "input": "...", "output": "..."},
    	{"instruction": "...", "input": "...", "output": "..."},
    	...
    ]
    // Instruction is the task LLM needs to do
    // Input is the input to the task
    // Output is the expected generation
    
  • completion: Like Oasst or Wikitext

    // Type: `completion`
    [
    	{"text": "..."},
    	{"text": "..."},
    	...
    ]
    

For more help in dataset formats please check out this page.

Models

Finetune Jobs

Integrations

Additional